[SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries#36753
[SPARK-39259][SQL][3.2] Evaluate timestamps consistently in subqueries#36753olaky wants to merge 3 commits intoapache:branch-3.2from
Conversation
|
Can one of the admins verify this patch? |
|
There are some Python errors in the build, which seem to me like there is a problem with pandas or python versions in the build pipeline. Anyway, I really doubt that I broke those tests |
Looking at GAs for commits in https://github.com/apache/spark/commits/branch-3.2, they have been broken for some time already. cc @HyukjinKwon @ueshin @itholic |
|
Let me fix it |
|
Let's wait for #36759 and then re-trigger the build via an empty commit: |
|
@olaky Could you re-trigger builds, please. |
63cc981 to
ae808ea
Compare
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Hi, folks. Please hold on this.
This patch seems to break Scala 2.13 in master/branch-3.3 (including RC4).
|
Here is the PR to fix Scala 2.13 issue. |
|
When updating this PR, let's also pull in my changes from #36765 . When merging this, we should probably pick it all the way back to 3.0 (since it looks like a correctness issue that would also impact those versions). |
…` in `ComputeCurrentTimeSuite` ### What changes were proposed in this pull request? Unfortunately, apache#36654 causes seven Scala 2.13 test failures in master/3.3 and Apache Spark 3.3 RC4. This PR aims to fix Scala 2.13 ClassCastException in the test code. ### Why are the changes needed? ``` $ dev/change-scala-version.sh 2.13 $ build/sbt "catalyst/testOnly *.ComputeCurrentTimeSuite" -Pscala-2.13 ... [info] ComputeCurrentTimeSuite: [info] - analyzer should replace current_timestamp with literals *** FAILED *** (1 second, 189 milliseconds) [info] java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot be cast to scala.collection.immutable.Seq [info] at org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTimeSuite.literals(ComputeCurrentTimeSuite.scala:146) [info] at org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTimeSuite.$anonfun$new$1(ComputeCurrentTimeSuite.scala:47) ... [info] *** 7 TESTS FAILED *** [error] Failed tests: [error] org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTimeSuite [error] (catalyst / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 189 s (03:09), completed Jun 3, 2022 10:29:39 AM ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs and manually tests with Scala 2.13. ``` $ dev/change-scala-version.sh 2.13 $ build/sbt "catalyst/testOnly *.ComputeCurrentTimeSuite" -Pscala-2.13 ... [info] ComputeCurrentTimeSuite: [info] - analyzer should replace current_timestamp with literals (545 milliseconds) [info] - analyzer should replace current_date with literals (11 milliseconds) [info] - SPARK-33469: Add current_timezone function (3 milliseconds) [info] - analyzer should replace localtimestamp with literals (4 milliseconds) [info] - analyzer should use equal timestamps across subqueries (182 milliseconds) [info] - analyzer should use consistent timestamps for different timezones (13 milliseconds) [info] - analyzer should use consistent timestamps for different timestamp functions (2 milliseconds) [info] Run completed in 1 second, 579 milliseconds. [info] Total number of tests run: 7 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 7, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 12 s, completed Jun 3, 2022, 10:54:03 AM ``` Closes apache#36762 from dongjoon-hyun/SPARK-39259. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
JoshRosen
left a comment
There was a problem hiding this comment.
LGTM for backport pending tests.
For committers: whoever merges this should also try to merge to 3.1 and 3.0.
|
I guess the failure is not related to PR's changes: Last commits in https://github.com/apache/spark/commits/branch-3.2 fail on the same. |
|
+1, LGTM. Merging to 3.2 and trying to merge to 3.1/3.0. |
### What changes were proposed in this pull request? Apply the optimizer rule ComputeCurrentTime consistently across subqueries. This is a backport of #36654. ### Why are the changes needed? At the moment timestamp functions like now() can return different values within a query if subqueries are involved ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A new unit test was added Closes #36753 from olaky/SPARK-39259-spark_3_2. Lead-authored-by: Ole Sasse <ole.sasse@databricks.com> Co-authored-by: Josh Rosen <joshrosen@databricks.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>
|
@olaky The changes cause conflicts in branch-3.1. Could you PRs w/ backports to 3.1 and 3.0, please. |
|
Merging is blocked because of a test failure that also surfaces in #36386 |
|
@MaxGekk since you closed this, should I still work on propagating this to 3.1 and 3.0? And how should we deal with the test failures happening on branch-3.2? |
|
@MaxGekk 3.1 already does not have any transform with subqueries function, so I would have to backport this as well. I personally feel that this could be a risky endeavour not worth doing to get this fix, what do you think? |
Could you list required PRs, please. Is it possible to extract only needed functions from them? |
|
And for 3.0 we have: #36822 (looks to me like there is an infrastructure problem with building it) |
Thank you for making a backporting PR to |
### What changes were proposed in this pull request? Apply the optimizer rule ComputeCurrentTime consistently across subqueries. This is a backport of apache#36654. ### Why are the changes needed? At the moment timestamp functions like now() can return different values within a query if subqueries are involved ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A new unit test was added Closes apache#36753 from olaky/SPARK-39259-spark_3_2. Lead-authored-by: Ole Sasse <ole.sasse@databricks.com> Co-authored-by: Josh Rosen <joshrosen@databricks.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Max Gekk <max.gekk@gmail.com>

What changes were proposed in this pull request?
Apply the optimizer rule ComputeCurrentTime consistently across subqueries.
This is a backport of #36654.
Why are the changes needed?
At the moment timestamp functions like now() can return different values within a query if subqueries are involved
Does this PR introduce any user-facing change?
No
How was this patch tested?
A new unit test was added